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ABSTRACT 

We discuss the implementation of Bayesian inversion methods in order to recover 
the properties of the intergalactic medium from observations of the neutral hydrogen 
Lyman-a absorptions observed in the spectra of high redshift quasars (the so-called 
Lyman-a forest). We use two complementary schemes (i) a constrained Gaussian ran- 
^ ' dom field linear approach and (ii) a more general non-linear explicit Bayesian decon- 

, volution method which offers in particular the possibility to constrain the parameters 

of the equation of state for the gas. 

The interpolation ability of the first approach is shown to be equivalent to the 
second one in the limit of negligible measurement errors, low- resolution spectra and 
null mean prior. 

While relying on prior assumption for the two-point correlation functions, we show 
how to recover, at least qualitatively, the 3D topology of the large scale structures in 
redshift space by inverting a suitable network of adjacent, low resolution lines of sight. 
The methods are tested on regular bundles of lines of sight using iV-body simulations 
Q ■ specially designed to tackle this problem. 

|H \ We also discuss the inversion of single lines of sight observed at high spectral 

C/3 . resolution. Our preliminary investigations suggest that the explicit Bayesian method 

' can be used to derive quantitative information on the physical state of the gas when 

the effects of redshift distortion are negligible. The information in the spectra remains 
degenerate with respect to two parameters (the temperature scale factor and the 
polytropic index) describing the equation of state of the gas. 

Redshift distortion is considered by simultaneous constrained reconstruction of 
the velocity and the density field in real space while assuming statistical correlation 
between the two fields. The method seems to work well in the strong prior regime 
where peculiar velocities are assumed to be the most likely realization in the density 
field. Finally, we investigate the effect of line of sight separation and number of lines 
of sight. Our analyses suggest that multiple low resolution lines of sight could be used 
to improve most likely velocity reconstruction on a high resolution line of sight. 

Key words: Methods: data analysis - N-body simulations - statistical - Galaxies: 
intergalactic medium - quasars: absorption lines - Cosmology: dark matter 



1 INTRODUCTION 

It has been realized recently that the cosmological mass density of the baryons located in the intergalactic medium (IGM) 
at high redshift is similar to the total cosmological mass density of baryons predicted by primordial nucleosynthesis theories 
(Petitjean et al. 1993; Press & Rybicki 1993; Meiksin & Madau 1993; Rauch et al. 1997; Valageas et al. 1999). Therefore, 
there is probably a close interplay between galaxy formation and IGM evolution. The IGM acts as the baryonic reservoir for 
galaxy formation, while star formation activity in forming galaxies should influence the physical state of the IGM through 
metal enrichment and emission of ionizing radiation. Hence it would be of primary interest to be able to correlate the spatial 
distribution of intergalactic gas with that of galaxies. 

Neutral hydrogen in the IGM is revealed by the numerous absorption lines seen in QSO spectra (the so-called Lyman-a 
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forest). The physics of the gas is remarkably simple: its thermal state is governed by photo-ionization heating and adiabatic 
cooling (e.g., Hui & Gnedin 1997; Weinberg 1999) and its dynamics results from the effects of gravity on large scales and 
pressure smoothing on small scales (Reisenegger & Miralda-Escude 1995; Bi & Davidsen 1997; Hui et al. 1997). Dark matter 
and baryons trace each other quite well and the Lyman-a forest is due to mildly over-dense fluctuations in a pervasive 
medium with density contrasts of the order of 1 to 10. The gas should be distributed along filaments and/or sheets of 
significant extension. 

This is supported by observations of multiple lines-of-sight (LOS) showing that the gaseous complexes producing the 
Lyman-Q forest have large sizes. Indeed, in the spectra of multiple images of lensed quasars with separations of the order of a 
few arcsec (Smette et al. 1995; Impey et al. 1996), the Lyman-a forests appear nearly identical, implying that the absorbing 
objects have sizes >50h^^ kpc.^ Pairs with separation up to 500 h^^ kpc show an excess of absorptions common to both 
LOSs compared to what is expected for an uncorrelated distribution of absorption lines (Dinshaw et al. 1995; Petitjean et 
al. 1998; Crotts & Fang 1998; D'Odorico et al. 1998). This suggests rather large dimensions or better coherence length and a 
non-spherical geometry of the absorbing structures (Ranch & Haehnelt 1995). 

Recent A''-body simulations have provided a consistent theoretical framework for the description of the intergalactic 
medium (Cen et al. 1994; Petitjean et al. 1995; Hernquist et al. 1996; Zhang et al. 1995; Miicket et al. 1996, Miralda-Escude et 
al. 1996; Bond & Wadsley 1998). The simulations are very successful at reproducing the main characteristics of the Lyman-a 
forest: the column density distribution, the Doppler parameter distribution, the fiux decrement distribution and the redshift 
evolution of absorption lines. It has become clear that the Lyman-a forest is a powerful tool to investigate key cosmological 
issues such as: the re-ionization of the universe (Abel & Haehnelt 1999; Schaye et al. 1999; Ricotti et al. 2000); the density 
fluctuation power-spectrum (Croft et al. 1998; Gnedin & Hui 1998; Hui 1999; Nusser & Haehnelt 1999a), the geometry of the 
Universe (Hui et al. 1999) or cosmological parameters (Weinberg et al. 1999). 

Applications to real data have led to interesting constraints on the fluctuation power-spectrum (Croft et al. 1999; Nusser 
& Haehnelt 1999b), cosmological parameters (Weinberg et al. 1999; Theuns et al. 2000) or the physical characteristics of the 
gas (Schaye et al. 1999). However, these studies are presently limited by the amount of information available and show that 
it is important to increase current LOS data sets. 

Two approaches can be considered: (i) increasing the number of LOSs observed at intermediate and high spectral 
resolution in order to improve the precision of the above measurements; large redshift surveys in progress or in preparation 
such as the Sloan Digital Sky Survey (SDSS; e.g., Szalay 2000) the Two degree Field (2dF; e.g.. Pokes et al. 1999) or the 
VIRMOS redshift survey (e.g., Le fevre et al. 1998) should dramatically increase the number of low spectral resolution QSO 
spectra available for analysis; (ii) using groups of QSOs to constrain the 3D distribution of the gas and to study redshift-space 
distortion effects taking into account peculiar velocities in the reconstruction; the ultimate goal would be to increase the 
density of LOSs so that the reconstructed 3D spatial distribution of the gas can be correlated with galaxies observed in 
the same field; the deep imaging surveys planned with MEGACAM (e.g., Boulade et al. 1998) at the Canada- France-Hawaii 
Telescope and follow-up spectroscopy should provide data for such projects. 

It is thus of first importance to prepare the tools needed for the interpretation of the wealth of data that will be provided 
by the planned surveys. Nusser & Haehnelt (1999a) have described a method for the recovery of the real space density 
distribution along one LOS. Using an analytical model of the intergalactic medium, they propose a direct inversion of the 
Lyman-a forest seen in the QSO spectra using an iterative scheme based on Lucy's deconvolution method (Lucy 1974). This 
method yields fields for the density in contrast to Voigt profile decomposition. 

Here we show that these techniques can be generalized to multiple LOSs to reconstruct the 3D density field (see Vergely 
et al. 2001 for a similar application to the 3D mapping of the local interstellar medium). This should help for characterizing 
the structures (filaments, sheets...), determining physical properties of the gas (temperature, peculiar velocity) and discussing 
the cosmological evolution of the IGM. 

This paper is organized as follows: in section 2 we present basic equations describing the relationship between absorption 
along LOSs and properties of the IGM. Section 3 is concerned with sketching the basis for the inversion technique; two 
methods are described, a Bayesian regularized inverse method and a constrained random Gaussian field reconstruction, which 
can actually be seen as a particular case of the first method. Section 4 describes two N-body simulations from which we 
construct simulated data. Section 5 discusses the use of inversion techniques implemented here (i) to recover the 3D spatial 
distribution of the IGM from Lyman-a forest absorption lines on large scales while neglecting thermal broadening; (ii) to 
address the issue of thermal broadening on small scales; (iii) to take into account peculiar velocities and correction for the 
induced redshift distortions. 



2 THE LYMAN-« OPTICAL DEPTH ALONG A LINE OF SIGHT 

The optical depth, Te{w), along the LOS £, at projected position xx.f = (ye, zi) on the sky, and in velocity space, w, is related 
to neutral hydrogen density, nni, by : 

I \ '='^0 [ f ( /"^°° "•Hi(a::,xx) / [w - x - Vj,{x,:x.^)f\ \ 2 e ^ t ^^\ 

= ^ I I \ I —n ^exp dx Ud(xx - xx,f)d x_l , ^ = 1 • • • L, (1) 

* where /lys is the Hubble constant expressed in units of 75 km/s/Mpc. 
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where ctq is the effective cross-section for resonant hne scattering, H{'z) the Hubble constant at mean redshift J and Vp{x) is 
the projection of the peculiar velocity along the LOS. The double sum over xx corresponds to the integration in the directions 
perpendicular to the LOSs. 5d is the 2D Dirac distribution. The Doppler parameter &(x) is considered a function of the local 
temperature of the IGM at point x = {x,x±) where x is the real space coordinate expressed in km/s [= rH{z)]. 

This work is concerned with assessing the inversion of equation with the aim of constraining the 3D fields, nHi(a;,xx), 
6(x,xx) and Vp{x,x±), from the knowledge of a bundle of lines of sight, £ — 1 • ■ ■ L. 



2.1 The model 

To relate the gas density, the dark matter (DM) density and the temperature, we follow the prescriptions of Hui & Gnedin 
(1997). We refer to this paper for a detailed derivation of the relations given below. We assume that baryons trace dark matter 
potential (Bi & Davidsen 1997) and are in ionization equilibrium. Therefore, 

nm oc Pdm 7"" '' , (2) 

where tihi is the neutral hydrogen particle density and pdm the dark matter density. 

Considering that shock heating is unimportant for the thermal budget of the intergalactic gas (Hui & Gendin 1997), an 
effective equation of state describes the physical state of the gas, 

T(x)=Tf^^^V' . (3) 
V Pdm ) 

The parameter /3 is in the interval < /? < 0.31 (this upper bound corresponds to the asymptotic value at 2; = far from 
re- ionization). Therefore, 

nHi(x) = TiHi 1 ) with a scaling q = 2—1.4/? . (4) 

If there is no turbulence then the Doppler parameter h(x) at each position is due to thermal broadening only, 

'<-)-"™-/^^(^)^ <^> 

and equation (|l|) becomes 

/pdm(2;,xx)\" / {w - X - Vp(x,y.x}if 
— I exp — C2 — 

V Pdm / V [PDM(a;,xx)/pDM] 

The parameters ci and C2 depend on the characteristic temperature of the IGM 



nM=A{z)c^lll (^^£^±l±iZ) exp ( -C2 '^-r :w 1 '^"(^^ - ^^.^) d'^x ■ (6) 




-1 



^2 T \ J ./_s _ ccro T 



where J is the ionizing flux assumed to be uniform. Here the temperatures are given in Kelvin. The value of A{z) is fixed by 
matching the observed average optical depth (~ 0.2 ai z — 2) 



2.2 The regimes of interest for the reconstruction 

Several regimes will be considered in § 5 when performing the inversion: 

(i) Small scales or high resolution {£ ^ 0.1 Mpc): in this regime, and although it might not necessarily be a good approxi- 
mation (e.g. Hui, Gnedin & Zhang 1997), we simply assume that redshift distortion is negligible [vp = in equation (^)] and 
reconstruct the density field in redshift space while constraining the equation of state. 

(ii) Large scales or low-resolution {£ ^ 1 Mpc): in this regime, applicable to low resolution spectra, thermal broadening can 
be neglected and equation (|l|) simply becomes: 

r \ Af-\ f f f Pomiw ~ Vp{x{w,x±)),x±)\" 2 <■ « 1 r /q\ 

Te(w) ^ A{z) / / ^ -] dD(xx - xx,«)d XX , for ^ = 1 • • • L , (8) 

JJ\ Pdm J 

where x{w, xx) is defined implicitly by the equation x — w—Vp(x, x±). Our efforts in this regime will focus on 3D reconstruction 
of the density in redshift space, i.e. with = in equation (H) and known equation of state for the gas. In principle, redshift 
distortion should not be neglected, but this does not change significantly the topology of large scale structures, at least at 
weakly non-linear scales, making thus such simplified analysis still relevant. 

(iii) Intermediate scales or intermediate resolution {0.1 ^ £ ^ 1 Mpc): Redshift distortion will not be neglected anymore 
and equation (^ will be used to determine simultaneously the density and velocity fields, assuming that the effective equation 
of state is known. 
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Note that we neglect here the statistical scatter away from equation and in particular the departure from a unique power 
law for larger over-densities. 



3 DECONVOLUTION OF THE IGM 

The basic idea is to interpolate between adjacent LOSs the fields which are measured along the LOSs. This first requires 
assumptions on the nature of the fields. In fact, strictly speaking, our ability to say anything away from the LOSs could 
be questioned, since to the best of our unbiased knowledge, space between the LOSs could well be empty. Moreover, the 
inversion of equation (|l|) is obviously not unique and additional assumptions must be made in order to reduce the parameter 
space. For example, the Doppler parameter and/or the peculiar velocity fields are taken to be described by a simple function 
of the sought density field, nm- Indeed, dynamical considerations supported by numerical simulations suggest there exists a 
statistical relationship between over-densities and the corresponding projected velocity field, while temperature and density 
are also statistically related by an equation of state. 

This paper addresses these issues via two techniques: 



(i) a general, explicit Bayesian deconvolution method (§ 3.1), capable of dealing with fields and priors such as a given 



equation of state. This method should allow one to deconvolve thermal broadening non linearly while accounting for peculiar 
velocities and therefore to reconstruct the density /velocity field along a LOS and constrain the equation of state of the gas. 
With several LOSs, it should simultaneously be possible to obtain the three dimensional density field. 

(ii) a constrained Gaussian random field linear approach (§ [3.2| ), which relates the peculiar velocities projected along the 
LOS to the 3D density field or directly the 3D density field to the LOS density in redshift space. It requires prior knowledge 
of the logarithm of density in redshift space along each LOS but can be used after applying method (i) to each LOS. 

In fact, method (i) is very general and can be applied in many ways, which mainly differ in the priors taken for the statistical 
properties of the density and velocity fields. Method (ii) corresponds to a given choice of strategy for the 3D density /velocity 



reconstruction step: like Wiener filtering, it is a particular case of method (i) (§ 3.3) 



3.1 A non-parametric explicit Bayesian regularized inverse method 

We aim to invert equation (1), i.e. reconstruct the density field nm and the velocity field Vp{x,:x.±). To that end, we take a 
model, g, such as equations (3)-(^, which basically relate the Doppler parameter b and the gas density nm to the dark matter 
density, pdm, and obtain equation (^). In this equation, there are a certain number of parameters to be determined, which 
can be continuous fields such as the dark matter density or the velocity field, or discrete parameters such as a and /3. This 
set of parameters can be formally described as a vector, M. The goal here is to determine M by fitting the data, D, i.e. the 
absorption spectra along the A*' LOSs. 

Since the problem is under-determined, we use a Bayesian technique described in Tarantola & Valette (1982a; see also 
e.g. Craig & Brown 1986; Pichon & Thiebaut 1998). In order to achieve regularization, this method requires prior guess for 
the parameters, or in statistical terms, their probability distribution function, /prior (M). 

Using Bayes' theorem, the conditional probability density /post(M|D) for the realization M given the observed data D 
then writes: 

/po.t(M|D) = £(D|M)/prior(M) , (9) 

where C is the likelihood function of the data given the model. 

If we assume that both functions £ and /prior are Gaussian, we can write: 

/post(M|D) = ^exp (-i(D - g{M))^ ■ C^' ■ (D - ^(M)) - i(M - Mo)^ ■ Co ^ ■ (M - Mo)) , (10) 

with Cd and Cq being respectively the covariance "matrix"^ of the observed data and of the prior guess for the parameters, 
Mq. ^ is a normalization constant. The superscript, _L, stands for transposition. The first argument of the exponential in 
equation ( p^ corresponds to the likelihood of the data given the model and the parameters^ while the last correspond to 
the likelihood of the parameters given the prior Mq. Note that the assumption of a Gaussian field for /prior could be lifted, in 
particular to account for the presence of contrasted filaments (i.e. we could introduce 3 point correlation functions, or higher 
order statistics to account for the fact that, say, the prior likelihood of aligned overdensities is higher). A possible method for 
maximizing the posterior probability given in equation (^^ is sketched in Appendix In a nutshell, the minimum, (M) , of 
the argument of the exponential in equation (^o|) is shown by a simple variational argument (Tarantola and Valette, 1982a ; 
1982b) to obey the implicit equation: 

(M) = Mq -f Co ■ • (Cd + G ■ Co ■ G-^)"' ■ (D + G ■ ((M) - Mo) - ff((M))) , (11) 
t Formally defined on continuous + discrete fields, as is the vector M. 

i Note that the model g taken here would correspond to equation (p|) instead of equations (pb-M as said earlier. 
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where G is the matrix (or more rigorously, the functional operator) of partial derivatives of the model g(M.) with respect 
to the parameters. Note that, under the assumption of Gaussianity, the extremum (M) is at the same time the most likely 
constrained value of the parameters vector and its mean value. The posterior covariances of the parameters. Cm, can be 
computed from equation (|A6|). 

The method can in principle be iterated, taking in equation ( pj| ) Mo — M and Co ~ Cm to compute a new value of M 
until possible convergence. However, in this paper, we did not test this procedure. We might then wonder how the choice of 
the prior for the param eter s, Mq and their covariance matrix, Co, affect the final result, (M). 

We will show in § 3.3 that for null prior, Mo = 0, the method proposed here is equivalent to Wiener filtering if the 



model is linear [(?(M) = CM]. However, we may include more prior information when possible. For instance, if in the field of 
interest, redshifts of galaxies and clusters, gravitational lensing or SZ data, etc., are available, we may explicitly incorporate 
these additional constraints in the prior Mo instead of extending the data set, D. More realistic expressions accounting 
for the statistical scatter around equation and a possible slope break are also possible. Additional information about 
our prejudice on the evolution of large scale structures can also be incorporated in the description of the prior probability 
distribution function to account for, say, dynamically induced non Gaussianity. 



3.2 Constrained random field reconstruction 



The explicit Bayesian method described above can be applied to the data to re construct along each LOS the density field in 
redshift space while constraining the equation of state, as illustrated in § |5.3| . When dealing with the large scale regime of 
2.2, equation (^) applies and the density contrast, defined by 



6{x) = log [pdm/Pdm] ~ (PDM - Pdm)/Pdm . (12) 
reads, along each LOS and in redshift space {x = w), 

5e{x,^^) = ^\og(^) . (13) 



This section focuses on recovering the 3D density field in redshift space or in real space, the latter case requiring treatment 
of peculiar velocities. To achieve that, we use a constrained random field method (e.g., Hoffman & Ribak 1992). Broadly 
speaking, such a method assumes that part of a model (here, the density in redshift space along the LOSs) is fixed by the 
observations. It then provides the relation between these "data" and the most likely value of the remaining part of the 
parameters (here, the density between the LOSs and the full 3D velocity field). This method requires some assumptions on 
the statistical properties of the searched fields. The idea is to consider large enough scales so that non-linear effects have not 
driven dynamically the system too far away from its initial conditions which we assume to be Gaussian distributed.^ The 
theory of constrained random Gaussian fields is well known (e.g., Rice 1944, 1945; Longuet-Higgins 1957; Adler 1981; Bardeen 
et al. 1986 and references therein) and application to our problem is detailed in Appendix |b|. 

We assume that the constraints are distributed along a bundle of L LOSs, i.e. that the density contrast [defined above in 
equation (p^)] takes the values \&i(x')\i^\...l along the LOSs. Then, using linear perturbation theory and the Gaussian nature 
of underlying fields, we can write the probability distribution function of the 3D velocity or density field in redshift space in 
terms of these constraints and of the 3D power-spectrum of the density field, P3_D(k). A prior is thus required for P3d(Ic), but 
an iterative procedure can in principle be implemented, using the Pzd{^ measured in the reconstructed data after redshift 
distortion deconvolution as a new prior. 

We d emonstrate that the most likely velocity (i^p)^ along the line of sight I is given by the linear relationship [equa- 



tion ( |B14| )] 

{vj,) ^(x) ^'^^ \ Kui(x,x)5i:{x)Ax- , or discretely (vp) = C„i ■ C^/ ■ <5 , (14) 



where the kernel, Kiii(x,x'), is a simple function of the assumed 3D power spectrum given by equation (B14), while Css and 



Cv5 are respectively the log density auto correlation, and the mixed log density-velocity correlation given by 

Cm = ({(5,:(5j)),^i...„j=i...„ , C,i = ({i;,(5j)),^^...p^^.^-^,..„ , (15) 

assuming we know the log-density at n points in space (p stands for the number of points at which we seek the velocity). 

To obtain the density in real space along one LOS, it is possible to rely on the explicit Bayesian method once more, 
by using for the model, g, equation ^ or equation (^) with Vp given by equation (|l^. This "strong prior" regime will be 



tested against simulations in § 5.4.2, Of course, the Bayesian method could as well allow us to perform the simultaneous 3D 
reconstruction of the density field. 

The constrained random field machinery can also be used to reconstruct the 3D density field in redshift space (or in real 
space once the density along each LOS is deconvolved from redshift distortion), <^(5''^°'^(x). This is particularly relevant at 
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low spectral resolution which corresponds to the large scale regime, where equation ( p^ can be directly used for 5i{x). One 
obtains [equation ( JB15[ )] 

(5(^°))(x,) = J2 f Kif\^,,^',)S,ix')dx' , or = C,<3d), ■ CJ^ S, (16) 



where the kernel, Tf^^'^' (xa, x'^), is also a function of the assumed 3D power spectrum given by equation (BIE). Css is given 
by equation (jl^, C_j(3D)^ is the mixed LOS-3D over-density correlation given by C^(3d)^ = {J^Sf'^^Sj 



z — 1 ■ - -p, J — 1- ■ -n 



3.3 Overlap between the two methods and connection with Wiener filtering 

The above extrapolation technique is restricted to quasi linear analysis in redshift space and unsaturated absorption lines, 
since it assumes a priori that the density is known along each LOS and that it is Gaussian distributed. As such, constrained 
random fields methods cannot be applied directly to equation (hi) which involves a double non-line ar c onvolution over the 



underlying density both explicit (via nni) and implicit (via Vp). The Bayesian approach sketched in § 3.1 is more general and 
makes less stringent assumptions. In particular it should provide means of applying redshift distortion correction on the fly 
while accounting for temperature induced blending. We nonetheless show that, for linear models, when the prior dominates, 
the extrapolation ability of equation (^^ reduces to constrained random field extrapolation, while, in contrast, in the zero 
prior limit, it reduces to Wiener filtering. We also show how the covariance of the prior log-density and velocity can be adjusted 
to fix a unique linear relationship between the sought density field and its redshift distortion. 

Let us start from the explicit Bayesian method. If the prior is null, Mq = 0, the error in the measurements negligible, 
C(i ~ 0, the model linear, (/(M) = G • M, equation ( [ll| ) becomes 

(M) = Co ■ G-^ • (G- Co • G-^)"' -D. (17) 

When recovering the 3D density field from the measured density along the LOSs, Co = C^(3d)5(3d), the linear operator G 
operates then simply like a Dirac comb on a field 77: 



Gfq = J (5d(xx - xx£)j7(x) dxx , (18) 
so that 

Co ■ G^ = C^(3D)5 and G • Co ■ G^ — Css , which implies for equation (17); (^S^'"^^^^ — C^(3d)^ • {Css)~^ ■ S . (19) 

Equation ( [l9[ ) is identical to equation (^6|) . Note incidentally that if the prior is null and the model linear but if the errors in 
the measurements are accounted for, equation (|l^) becomes 

(M) = Co ■ G-L ■ (G ■ Co ■ G-L + Cd)"' ■ D = (G^ ■ C^' ■ G + Co ')"' ■ G-^ ■ C^ ' ■ D , (20) 

which corresponds to Wiener filtering (Wiener 1949; Zaroubi et al. 1995). In other words, when the model is linear, our method 
is equivalent to Wiener filtering applied to M — Mq. When we seek to invert for both S and Vp (hence imposing a weak prior 
on the field), 

M=(<5,vp), (21) 
The penalty function [corresponding to the log of the prior in equation (p^)] can be re-arranged [cf. equation (B2)] 



(M - Mo)^ • Co ' ■ (M - Mo) = (vp - C^s ■ Cjl ■ 5)^ ■ (C,„ - C^s ■ Cjl ■ C^s^) ' • (vp - C,* • Cjl ■ S) . (22) 
The strong prior regime, mentioned in § 3.2 and tested in § ^.4.2| , is therefore a sub-case of equation ( p^ ) where 



C„i, ^ C^s ■ Cgg^ ■ C„5^ implying Vp ^ C^s ■ C^/ • S , 

i.e. Vp will take its most likely value as was assumed in equation (p^. 

Both the explicit Bayesian method and the constrained random field reconstruction require detailed description of a prior 
model for the large-scale structure of the IGM in order to fix Mo, Co, P^oi^), plus additional relationships such as those 
sketched in § |^. As mentioned earlier, these methods can be iterated with new priors measured in the reconstructed data, but 
we have not tested the convergence of such a scheme and leave that to future work. 

4 NUMERICAL SIMULATIONS 

To test our methods we use two standard Cold Dark Matter A''-body simulations. The gas distribution is derived from the 
dark matter distribution, using simple recipes described in § 2 and based on previous works (e.g., Hui & Gnedin 1997; Nusser 
& Haehnelt 1999a). As discussed in Analysis of more realistic numerical simulations, taking fully into account the details of 
the gas dynamics is left for future work. Many aspects of the reconstruction problem do not strongly depend on the detail of 
the gas dynamics. 
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Figure 1. The dark matter distribution in the small simulation box, S, at 2 = 2 (see Table |l| and text). The color scales roughly 
logarithmically with the projected density. Darker regions are denser. 



The simulations were run with a Particle-Mesh (PM) code, fully vectorized and parallelized on SGI-CRAY architecture 
with shared memory.^] The characteristics of the simulations, S and B, which involve respectively ~ 32 and ~ 16 millions 
particles, are given in Table |l|. The cosmological parameters are inspired from Jenkins et al. (1998). The particles were laid 
down on a mesh with the same shape as the grid used to compute the forces. Then the Zel'dovich (1970) approximation was 
used to perturb the positions of the particles and to set up Gaussian initial conditions with the appropriate power-spectrum 
for standard Cold Dark Matter (CDM). This was done in a similar way as in the COSMICS package of Berstchinger (1995). 
To avoid effects of transients (e.g., Scoccimarro 1998), the simulations were started at high redshift z = 255 and evolved 
until the desired redshift, z = 2. Figs. |l| and |^ display the corresponding dark-matter distribution. A detailed analysis of the 
power-spectrum and the variance of the density field measured in the simulations is presented in appendix ^. 

The spatial comoving resolutions of simulations S and B are Ag ~ 4.9 and 40 km s~^ respectively, which corresponds to 
physical resolutions ~ 8.5 and 68 km s~^ at z = 2. This is to be compared with the maximum possible pixel resolutions of 
the instruments available on the VLT: UVES, A ~ 3 km/s, and FORS, A ~ 100 km/s. However, the actual resolution of the 
simulation depends on the physical parameter of interest and is always worse than the mesh resolution. For density related 
processes, we can expect the PM simulation to be sufficiently accurate at scales as small as ~ 2Ag, although dynamics can 
actually be contaminated by softening of the forces on scales as large as 6Ag (Bouchet et al. 1985). For velocities, which are 
quite sensitive to resolution, numerical comparisons between PM simulations and higher resolution codes show that results 
are correct within ~ 25 per cent at scales close to Ag (e.g., Colombi 1996). Concerning the gas dynamics, density fiuctuations 
are expected to be damped out below the Jeans length, and therefore it is not necessary to have a spatial resolution much 
better than this cut-off scale. For example, the thorough analysis of Gnedin & Hui (1998) shows that this scale is of the 



^ This program is an improved version of an older code (Bouchet, Adam & Pellat 1985; Alimi et al. 1990; Moutarde et al. 1991; Hivon 
1995). It uses for better performances a "predictor-corrector" (e.g., Rahman 1964) implementation of the time-step (instead of the 
traditional "leapfrog", e.g., Hockney & Eastwood 1981). It is still in construction but available on request by email at nic@iap.fr. 
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Figure 2. Same as Fig. hi but for the large simulation box, B. 



Table 1. Characteristics of the TV-body experiments. 



Model flo 


A 


h 


r 






L 


S 1.0 
B 1.0 


0.0 
0.0 


0.5 
0.5 


0.5 
0.5 


0.51 
0.51 


512 X 256 X 256 
1024 X 128 X 128 


50 X 25 X 25 
800 X 100 X 100 



Model: "S" and "B" stand for "small" and "big" respectively. 
Ho • value of the density parameter of the universe. 
A: value of the cosmological constant. 

h: parameterizes Hubble constant, Hq = lOOh km/s/Mpc. 

F: shape parameter of the initial power-spectrum (see, e.g., Jenkins et al. 1998 for details). 

the linear variance in the dark matter at present time in a sphere of radius Sh~^ Mpc (to fix the normalization). 
Np-. size of the grid used to compute the potential and the forces; also the number of particles. 
L: dimensions of the rectangular periodic box in comoving Mpc. 



order of 50 — 100 comoving kpc, i.e. 5 — 10 comoving km s^^. This roughly corresponds to the spatial resolution of the S 
simulation (at least for density-related quantities). In this respect, the resolution of the B simulation is not high enough, and 
this simulation is only used to test reconstruction of weakly non-linear structures. 

In addition to small scale softening and limited resolution, discreteness effects represent another source of concern, 
particularly in under-dense regions. We apply adaptive Gaussian smoothing to the particle distribution as follows. The mean 
quadratic distance, di, between each particle, i, and its six nearest neighbours is computed. This sets a smoothing length, 
£i = di, i.e. the Gaussian filter associated to particle i is Wi-{r) oc exp(— r^/2^f) within 3£i after appropriate renormalization. 
In practice, the smoothed density (or mass weighted velocity) is computed on a grid chosen here to be the same as the 
simulation grid. Each cell, j, is subdivided in A'^^ sub-pixels, kj, corresponding to positions Xk^, with = 3. The contribution 
of particle i to the grid site j writes 

Q,,cx J2 W,^{\r-Xk,\), (23) 

kj.\r — Xf^j I <3^ j 

with the appropriate normalization ^ . Cj^i = where is the mass of particle i. 
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Figure 3. Top panels, from left to right: The recovered log density versus the real (simulated) log density as a function of the distance 
between the LOSs, -LloSi labeled: as expected the bias increases with -Llos; Middle panels, from left to right: the model and the 
reconstructed density for -Llos = 2.5,4 and 5.5 Mpc comoving; Bottom panels, from left to right: a slice of 1x80x80 Mpc across the 
simulation and the reconstructed fields (the scale on the panels is in pixels) . Most of the small scale structures are lost in the reconstructed 
field. The large scale topology is however recovered. The rounded features in the reconstructed density are an artifact of the interpolation 
method. 



5 APPLICATION 



In this section, we apply the methods discussed in § ^ to simulated Lyman a spectra extracted from the A'^-body simulations 
[using equation (^]. 

Our preliminary analyses are organized as follows. In § 5.1, we give some details on the models and the priors used for 



both the Bayesian method and the constrained random field reconstruction. Section 5.2 deals with 3D reconstruction of the 



density field. We first test the constrained random field method in a regime where the density along each LOS is supposed to 
be known. Next, we test the Bayesian approach. The la tter method does not rely on such a strong prior for the density, and 
is first applied to the large scale regime discussed in 



2.2 



where thermal broadening can be neglected. Moreover, redshift 
distortion is not taken into account. In section 5.5 , we apply the Bayesian method to constrain the equation of state of the 

2.2 but neglect redshift distortion again for the sake of simplicity, 

which assume 



5.4 



gas. We consider the small scale regime as discussed in 

although peculiar velocity effects should realistically be accounted for. These velocities are dealt with in 
in turn that the equation of state of the IGM is well constrained. We analyse the efficiency of velocity reconstruction versus 
number of LOSs, and test Bayesian reconstruction in the frameworks of strong and floating priors. 

The reader will notice that for each problem considered, we neglect in turn either redshift distortion or thermal broadening. 
Accounting simultaneously for both effects can in principle be achieved with the explicit Bayesian method or a combination 
with the constrained random field reconstruction. However, our main goal here was to illustrate the method and to pin down 
various effects at each step of the reconstruction, concentrating on one particular property of the IGM, such as the structures 
of the 3D density field, the equation of state, or redshift distorsion. More general applications will be developed in future 
work. 



5.1 The priors 
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5.1.1 Explicit Bayesian method 

The Gaussian Bayesian prior [equation ([lo|)] is fully described by the first two moments: the prior choice for the parameters 
of the model, Mo, and its covariance, Co. 

For the model we choose the following combination of fields and discrete parameters: 

M= [7(a;,xx),«p(a;,xx),r,/3] . (24) 

Function 7(3;, xx) is defined as 

PDM(a::,Xx) 



Pdm 



= Dq{x,-x.±) exp [7(3;, xx) 



(25) 



so that positivity of density is insured. Here, -Do(a;,xx) is an arbitrary function (specified later) which fixes the value of the 
prior for pDM(a;, xx)/pdm) when 7(3;, xx) = 70 = 0. Note that A{z) is assumed to be known throughout the paper. 
For the prior, we take 



Mo = [0,0,ro,/3o] , 



(26) 



where the values of Tq and /3o will be given in § 5.3 



We derive the prior covariance operator Co either in an ad hoc manner (§ 5.2.2, 5.2 and 5.4.2) or from the simulations 
(§ 5.4.3). In the first case, C-yy, is chosen to obey : 



,XxJ 



: exp 



exp 



Fx ■ 



(27) 



where and are natural lengths in the inversion and govern the level of smoothness of the reconstruction. Typically, 
will be of order of the mean transverse distance between two lines of sight. The optimal choice for depends on the 
problem considered. If peculia r velo city effects are neglected, can be taken as small as the maximum scale between spectral 
resolution and Jeans length (§ |5.2.2 and § 5.J ). In that case, no small scale information is lost along the LOSs. However, when 
redshift distortion is to be taken into account (e.g. § [5.4.2| ) , it is necessary to have a smoother prior to stabilize the inversion, 
typically the length marking the transition toward the non-linear regime (in other words, the typical size of clumps). 

The parameter ct^ may, if required, depend on position. On average, it corresponds roughly to the variance of 7 in a 
rectangle of volume i,x^T- It governs indirectly by how much the reconstructed field, (M), is allowed to float around the prior 
Mq while solving equation (y_l|) with the iterative method detailed in Appendix A. When peculiar velocity effects are neglected, 
this parameter can be t aken t o be rather large, of the order of 0.2. Otherwise, the inversion process is more complicated: 
details will be given in § 5.4.2. Exponential correlation functions turned out to be more appropriate than Gaussian ones in 
order to recover filamentary structures: the covariance kernel given in equation (^) is steeper, which allows us to take into 
account high density fluctuations. 



5.1.2 Constrained random field reconstruction priors 



The constrained random fleld reconstruction method, applied in § 5.2.1, § 5.4.1 and § 5.4.2, also requires values for the prior 
covariance matrix Co, which is taken to be tho se measured in the simulations, as detailed in Appendix^ Some of the biases 
involved in this choice are discussed in 5 5.2.J . 



5.2 Large scales structures: tomography of the IGM 



We apply the two methods described in § 3 to recover the large scale structures in simulation B. For this purpose, we use a 
network of equally separated LOSs along which we simulate spectra in accordance with equation (^) ( as shown in Fig. ^ 
while varying the separation. We proceed in two steps: we first ignore all issues related to finite signal to noise ratios, thermal 
broadening or line saturation, and use constrained r andom fields to extrapolate the density away from the LOSs assuming 
that this latter is fully determined along the LOSs (§ 5.2.1 ); we then illustrate the Bayesian technique, which does not suppose 
that the density along the LOSs is known (§ 5.2.2). In the latter case, only t he larg e scale regime is considered [i.e., the regime 
(ii) discussed in § 2.2 1 and redshift distortion is neglected (vp = 0). Section 5.2.3 discusses shortcomings of the two methods 
and reahstic extensions. 



5.2.1 Constrained random field 

Let us flrst consider redshift space and assume that we have derived the density on each LOS using for example equation ( p^ . 
Recall that the most likely 3D density away from the lines of sight obeys equation (p^). The covariance matrix of the prior, 
Co — Css, is shown on the top of the bottom right panel of Fig. ^. We present the results of a reconstruction of part of 
simulation B in Fig. |^. For this flgure, we used the discrete form of equation (|l6|), on a regular network of overlapping sub-grids 
of size 20 X 20 X 20 pixels such that the centers of adjacent sub-grids are separated from each other by 10 pixels. The value 
of the reconstructed density on one pixel is obtained by a weighted interpolation of the recovered density on each sub-grid 
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Figure 4. Density contrast reconstruction using the Bayesian algorithm from a set of 9x9 lines of sight taken through simulation B. 
The distance between two adjacent lines of sight is equal to 2.4 Mpc comoving. Each panel represents respectively on the left the 
reconstruction and on the right the simulation. Dark regions correspond to over-dense regions. The filaments are well recovered. 



containing this pixel, the weight being inversely proportional to the distance of the pixel from the center of the sub-grid 
considered. This procedure insures smoothness of the reconstruction while keeping the size of the matrices reasonable. The 
top panels of Fig. s] illustrate the bias in the extrapolation procedure as we vary the distances between LOSs, the middle panels 
display the 3D reconstructed iso-log densities corresponding to 5 = 0.2, while the bottom panels show a slice through this 
field. The large scale filaments are recovered for all separations investigated, but small scale structures disappear beyond 2.5 
Mpc comoving of separation. The topography of the structures is well described. As expected the density is poorly recovered 
for the largest separations. 



5.2.2 Bayesian reconstruction: line saturation and finite signal-to-noise ratio (SNR) 

Choosi ng sim ply Do = I in equation (^, our model g, on pixelized data, reads [equation with Vp = 0, see also Ap- 
pendix 



D1.2 



(28) 



with a fixed equal to 1.7 Here, wu is the velocity at bin i corresponding to the LOS labelled I and 7(2;, xx) is the only 
parameter, for which the prior covariance is given by equation (fi'q). The parameters a-y, i^x are are resp ective ly chosen 
equal to 1, twice the resolution and 1.5 times the distance between LOSs. The matrix G is given in Appendix D1.2. Errors in 
the simulated data are modeled as follows. We assume that they are uncorrelated, so that the covariance error matrix Cd is 
diagonal, with elements given by 



+ 



-I 



1 



+ c^o exp(2r) , 



(29) 



" ~ F2 ~ (S'/iV)2 f 2 (5'/Ar)2 

since the observed flux is simply; ¥{111) — exp(— r (w)). Equation (||) states that the error on the flux has two origins: a 
constant signal to noise ratio component and a residual instrumental noise, cro, which dominates at large optical depth. In 
the inversion illustrated in Fig. |^, we use a SNR of 25 and a residual error of magnitude 0.01. 

The reconstruction of filamentary structures is only effective in the regime where the distance between lines of sight is of 
the order of 1-3 Mpc comoving. Beyond this limit, the isotropic method presented here is insufficient to recover the structure 
of the IGM [such anisotropic features may be described by higher order correlation functions and stronger assumptions relying 
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Figure 5. Left panels: Inversion using different equations of state. The upper panel shows a portion of simulated spectrum through S. 
The equation of state used corresponds to equation with T = ft = 10"* K, /3 = /3t = 0.2. Peculiar velocities are not considered. The 
lower panel shows the simulated density as black dots. The density recovered using the same equation of state is plotted as a solid line; it 
is apparent that even the internal structure of absorption blends is recovered. Other curves correspond to the results of inversions using 
various lower values of f at fixed /3 = 0.2. The effect of lowering T is to give smaller values for reconstructed density with a reduced 
< 1. If, on the contrary, T > Tt, one obtains 2> 1. Right panel: Map of convergence {x^ < 1) or divergence (x^ 2> 1) for inversions 
using equation (pOl) with different values of f and /3. The LOS is the same as in left panels. 



on a prior different from equation (|lC|)]. Inherent to the method is the hmitation that density fluctuations at scales smaller 
than the separation between LOSs are damped out by the reconstruction. Also, the probability to intersect a given strong 
over-density is inversely proportional to the amplitude of the over-density. In other words the information regarding rare high 
over-densities is simply not sampled enough by the lines of sight. A related effect is induced by flux saturation in the spectra 
depending on the spectral resolution and the SNR. For instance optical depths of r = 5 or 10 will correspond to very different 
over-densities but very similar (~ 0) fluxes. Note finally that for simplicity we have made use of Gaussian line profiles when 
Lorentzian would have been more appropriate. 



5.2.3 Discussion 



In the reconstruction of 5 5.2.1 



the density is assumed to be known along the LOSs, together with the covariance matrix of 
the 3D lo g-de nsity field. At low spectral resolution, we may neglect both thermal broadening and peculiar velocities and use 
equation (y_3|) to determine directly the density in redshift space from the Lyman-a forest along each LOS. At high spectral 
resolution, thermal debroadening and redshift distortion deconvolution could in principle be achieved simultaneously with 
the explicit B ayes ian method or a combination of the Bayesian method with the constrained random field reconstruction, as 



discussed in 5 3.2 



and shown below. 



Note also that our prior for the 3D covariance matrix in § 3.2.1 is optimal: it is measured directly in the simulation. In that 



sense, our reconstruction is biased since we use part of the correct answer in advance. Moreover, we go beyond Gaussian linear 
approximation, since we work on /og-density, which contributes to improve even more the reconstr uction . In real observations, 
we would not have a prior as good as that chosen here at our disposal. However, as shown in § 5.2.2, the results from the 
explicit Bayesian reconstruction, which rely on a much weaker prior, equation (p7|), give very similar results to the constrained 
random field reconstruction. This shows that the non linear features present in the measured correlations do not play an 
important role in our ability to carry out the inversion on the scales explored here. Finally, it may be worth mentioning again 
that the methods should be iterated, using for new priors and covariance matrixes the measured ones in the reconstructed 
field. 



5.3 Small scales: the IGM temperature 

We now aim to determine the eq uatio n of state of the IGM by considering the inversion of a single LOS observed at high 
spectral resolution [regime (i) in § 2.2 1. The inversion of the density, velocity and temperature fields from a single LOS is not 
unique (Theuns et al. 1999a; Hui & Rutledge 1999). Indeed, the same spectrum can be reconstructed with different equations 
of state and density distributions as illustrated by Fig. ^ Neglecting peculiar velocities for the sake of simplicity {vp = 0), the 
problem reduces to the determination of two parameters T and /3 and one unknown field, 7. The simultaneous determination 
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of these parameters and the field remains a degenerate problem. As detailed in appendix [D1.1| , our model, g, on pixelized 
data reads, from equation (^), 



g«(7) = A{z)ci 



(-Do(a;,xx) exp[7(2;,xx)])" ''exp ( -C2 



' {Do{x,x±) exp[7(a;,xx)])^'^ 



dx Sd{'X-± —x±i)d xx.(30) 



Here, A{z) is arbitrarily fixed to A{z) = 0.7 as explained in § 5.1.1, a = 2 — 1.4/3 [equation (j^] and ci and C2 are functions 
of T [equation (^]. The function Do{x,x±) is chosen to be 

The prior covariance matrix C-y-y is given by equation ( ]2^ ) with ^ oo. Here and are chosen equal to 0.2 Mpc comoving 
and 0.2. 

We conduct our analyses as fol lows. We first simulate a spectrum along one LOS with a given real pair (/9t, Tt)- The noise 
matrix Cd is the same as in § 5.2.2 with a {S/N, ctq) = (50, 0.05). We then invert this LOS for 7(2;) while varying (/3, T) over 
a given range of realistic values. In that sense, the only effective parameter in the inversion is the field 7. For each value of 
(/3, T), we compute the reduced x^y (D ~ 5(M))^-C^^.(D — g{M.)) in equation (p^), as shown in right panel of Fig. ^. The 
value of (/9t, Tt) is shown by a white cross. The T) plane is divided into two regions separated by a straight borderline, 
one with ^ 1 (corresponding to large values of T) and the other one with < 1- This arises because the absorption lines 
are indeed thermally broadened and resolved. When T > Tt, the absorption features in the data are narrower than the model 
and cannot be fitted anymore. 

As expected, the real parameters stand on the borderline between convergence and divergence: these parameters corre- 
spond to a good fit. We cannot however distinguish -using a x^ criterion- between pairs of (/3, T) on this borderline. Even 
though the degeneracy is not completely lifted, this analysis provides a complementary method to the standard techniques of 
Voigh profile fitting (see Ricotti et al. 2000; Schaye et al. 1999) to measure the mean properties of the IGM and its cosmolog- 
ical evolution. The application of our method to real data is developed to a companion paper (RoUinde, Petitjean & Pichon, 
submitted). 

Note finally that, for close enough lines of sight (e.g. multiple lensed QSO images) we might in theory be able to investigate 
the small scale 3D properties of the IGM while accounting for thermal broadening. 



5.4 Redshift distortion 



Recall that in this section, for the sake of simplicity, we assume that the equation of state of the IGM is known. 

There are several issues to address here. The optical depth along a bundle of LOSs does not constrain uniquely the 
corresponding velocity field. This would require the knowledge of the full 3D density distribution together with the assumption 
that linear dynamic applies. Thus, we first investigate how increasing the number of measured LOSs or changing the mean 
separation between them improve s the likelihood of the corresponding realization of the constrained velocity field for a given 
density field along the bundle (§ 5.4.1). We then turn to the problem of deconvolving the optical depth in real spa ce, bu t 
conduct a preliminary analysis on a single LOS. We test two approaches. The first approach is a strong prior inversion (§ 5.4.2), 
i.e. it relies on the Bayesian formalism while assuming that the velocity field takes its most likely value. The second method 
allows the velocity field to flo at aro und this most likely value (§ 5.4.3). Finally, we discuss the limitations of the present work 
and possible improvements 



5.4.4) 



Let us briefly describe the filters and correlation function involved. Fig. ^ (left panel) displays the 3D correlation function, 
Cys{x,x±), measured in simulation S. It is antisymmetric along the LOS, and symmetric orthogonally. The top right panel 
shows the ID filter, /^'"'(x,?/) [equation (0) with £ = £' = L = 1], which was in practice computed according to the 
prescription sketched in Appendix g. This antisymmetric filter presents two characteristic scales: a strong peak at « 2 Mpc 
(comoving) and broad wings up to « 20 Mpc. This implies that the most likely velocity at a given point will depend on the 
local density and also significantly on the density further away (up to ~ 20 Mpc). Transversally the shape of the 3D cross 
correlation function, C„i(2;,xx), which vanishes near the line x = 0, implies that the density away from a given point will 
dominate the local velocity field. 



5.4-1 Most likely velocity versus LOS separation & number of LOSs 

In this subsection we assume temporarily that the log-density field is known along a bundle of LOSs. In the framework of 
constrained random field (§ 3.2), equation (^4|) gives the relationship between the most likely velocity along a given bundle of 
LOSs and the corresponding log-density. 
Let us define the quality factor, Q, as 



where v^ac is the reconstructed velocity. Parameter Q measures the inverse residual misfit in units of the variance for the 
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Figure 6. Left panel: the 3D correlation function, C„f.(x. x i ), measured in simulation S. Top right panel: the filter iC^"' required to 
compute the most likely velocities along one LOS [equation ( |f4| ) with £ = £' = L = 1]. The width of the filter shows that the peculiar 
velocity has two natural scales as discussed in the main text. Bottom right panel: the ID LOS correlation functions: top sub-panel: 
log(C5i) ; middle sub-panel; C^^; bottom sub-panel: Csv 



velocity. We show in Fig. (top left panel) that this number increases with the number of LOSs sampling the sky, as 
expected. However, Q increases as well with the distance between LOSs until it reaches a maximum, which might sound 
confusing. This can be easily understood by examining left panel of Fig. ^ In fact, a bundle of LOSs constrains the transverse 
3D velocity distribution at intermediate scales, as a result of a competition between short range and long range correlations: 

(i) High frequency structures are read from the LOS through the two strong peaks along the x coordinate axis on the left 
panel of Fig. M (at approximately ±0.8 Mpc). Other LOSs can in principle contribute to small scales, but only if they are 
found very close to the LOS of interest (i.e. with xx — 0). 

(ii) Low 3D frequency features are mainly sampled by LOSs away from the LOS of interest, due to the significant tails 
present on C„a at scales as large as ~ 20 Mpc, as illustrated by top right panel of Fig. ^. This effect is three-dimensional, i.e. 
in all directions: it thus provides information on the structures transverse to the LOS. 

(Note that in this discussion, we implicitly assumed that Css — identity in equation (^^. Taking into account the real 
contribution of matrix C^^^ would simply boil down to smoothing the density with an isotropic filter, which has no effect on 
our qualitatives conclusions). 

The competition between effects (i) and (ii) fixes an optimal separation between the LOSs as a function of their number. 
From the top left panel of Fig. |^, we see for example that the optimal separation is 5 Mpc for a bundle of llx 11 LOSs. For a 
bundle with a smaller number of LOSs, the optimal separation becomes larger so that the tails of CvS are still fully sampled 
(but with a sparser binning and thus a smaller quality factor). 

The bottom right and left panels of Fig. Q compare the velocity along one LOS measured in the simulation to the 
reconstructed one by applying equation ( p^ to bundles of various sizes (1x1, 5x5 and 11x11) distributed uniformly on the 
sky (from simulation B), with a mean separation of 2.5 Mpc. With only one LOS, the reconstructed velocity does not account 
in detail for small structures although it seems to match well large scale flows in the example studied here. Increasing the 
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Figure 7. Top left panel: quality of the reconstruction [equation (|32[)] versus LOS separation and number of LOSs. Increasing the 
sampling on the sky decreases the dispersion between the constrained most likely velocity and the measured velocity as discussed in the 
text. Note the saturation for 11x11 LOSs at a separation of 2; 5 Mpc. Top right panel: isocontour for the quality of the reconstruction 
projected on the sky for a bundle of 11 X 11 LOSs, separated by 2.4 Mpc comoving. Note that the reconstruction obviously works better 
for the central LOSs. Bottom left panel: in simulation B, most likely velocity constrained by a single LOS. The solid line on the upper 
sub-panel corresponds to simulated velocity and the dashed one to the reconstructed velocity. The simulated density is displayed on 
lower sub-panel. Bottom, right panel: solid lines: simulated velocities along the center of a bundle of 5 X 5 LOSs or 11 X 11 LOSs; dashed 
lines: corresponding recovered velocities. 

number of LOSs significantly improves tiie reconstruction: witli a bundle of 11x11 LOSs, the reconstruction almost perfectly 
matches the simulation. 

An important outcome of this analysis is that since the optimal separation between LOSs is rather large (a few Mpc), 
the small scale information in the reconstruction is only contained in the LOS of interest. Therefore, having high resolution 
spectra on all the LOSs is not required: a survey dedicated to real space reconstruction should provide a high resolution 
spectrum together with a set of low resolution spectra separated by distances smaller than or of the order of ~ 4 — 5 Mpc 
comoving. Note that Q was computed while averaging over the whole bundle: the quality of the reconstruction in fact depends 
on the position of the LOS in the bundle, as illustrated in the top right panel of Fig. ^ Obviously, the quality factor is optimal 
at the center of the bundle: the high resolution spectrum should be located there. 

We assumed here the 3D covariance matrices needed for the reconstruction were known. In fact, we used the best possible 
guess for them since they were derived from direct measurement in the simulation. In reality we would have to proceed 
iteratively: for a given power spectrum, we could recover the 3D density, compute perturbatively the corresponding 3D 
velocity field and derive a new covariance matrix until convergence is achieved. We have not demonstrated here that this 
procedure is convergent. This is certainly a possible shortcoming of the procedure. 



5.4.2 Strong prior inversion 

Let us now try to deconvolve the density in real space along one LOS. A combination of the general Bayesian method and 
the constrained random field technique is implemented: the constrained random field method allows us to relate the unknown 



© 0000 RAS, MNRAS 000, 000-000 



16 C. Pichon, J.L. Vergely, E. Rollinde, S. Colomhi & P. Petitjean 



field Vp to 7, imposing that the pecuhar velocity takes its most likely value, but the recovery of 7 is still based on the Bayesian 
method. Our model, Qiiy), is now : 

gd7)-Aiz)c, £jDoix)eMli^W~'e^P ('^^ (ir;(;)Lpf;(l")}j2^ ) , (33) 
with the supplementary assumption that the pecuhar velocity in equation equals the most likely velocity (Appendix 



vpix) = {v} = J K^^>{x,yMy)dy, where K'-'"' ix,y) = ^ J e^^^'^'-y' ^^^^dK . (34) 

The unknown parameter remains the density contrast. The prior for the density is chosen as Do = 1 so that 7 = 5. Fo r the filter 
-ftr'"' (a;, y) we use a simple analytic fit of the function {x, y) measured in the simulation as expla ined in Appendix Bl.l . The 
derivation of the different vectors and matrixes involved in this case is sketched in Appendix D2.1. The practicalities involves 
fixing appropriately the parameters {cy~,,^x) in equation (|2^) (^t = 00 for a single LOS) for the minimization procedure 
detailed in Appendix^ to converge while providing as accurate reconstruction as possible. To stabilize the inversion, we need 
to take for a value close to the correlation length, = 1 Mpc. With a larger value of ^x, the inversion is still stable but 
makes the reconstructed density field too smooth, while a smaller value of makes the inversion unstable. The choice of a-y, 
which fixes the amount of variations allowed around the prior, is more delicate. A small value of makes convergence easier, 
but does not leave enough freedom for the reconstructed density to fioat around the prior: voids tend to be filled, and high 
density peaks are not saturated. On the contrary, a large value for a-, allows significant deviations from the prior but makes 
the iteration procedure less stable. For this reason, the reconstruction is carried in two steps. We first take a small value for 
cr^ = 0.0175, and reconstruct the density while using equation ( ^ ) to determine accurately the most likely velocity. Because 
of our choice of a^, the reconstructed density is not as contrasted as it should be, but this does not afi'ect significantly the 
corresponding most likely velocity: it just makes it smoother. In the second step, we fix the most likely velocity at the value 
obtained from the first step. Thus equation (^) is disregarded, and we iterate once more on the density with a larger value of 
Gj, a~f = 0.2, allowing more variations of the density around the new prior -the reconstructed density obtained from the first 
step. The fact that the most likely velocity is fixed indeed makes the inversion more stable and allows larger values of a-y. 

Fig. 1^ illustrates how the method performs on two unsaturated LOSs: the first isolated and the latter nearby a cluster. The 
simulated spectra assume A = 0.39, P = 0.4, T = 10" K, and were calculated after smooth ing th e density and velocity fields 



with a cube of size ~ 200 kpc (2 cells). The errors in the data are modelled as described in § 5.2.2 with {S/N, ctq) = (100, 0.05) 
in equation (^^ . As expected, the reconstructed velocity matchs the original only when there is no significant structure close 
to the LOS, likely to induce large-scale infall contamination. Bottom panels of Fig. |^ show that the reconstructed density 
reproduces well the shape of most structures, except that they are not correctly located along the velocity axis on bottom 
right panel. 

Note that our two-step procedure is similar in spirit to that proposed by Nusser & Haehnelt (1999a), although we use 
same smoothing length in both steps, which allows more small scale features on the reconstructed density. Also, our method 
is not yet able to deal with spectra containing significantly saturated absorption lines: in that case, the inversion is much 
less stable and the reconstructed most likely velocity is often unrealistic, even if the LOS is isolated. Finally, we assumed 
the kernel function K^"\x,y) was known, which should not be the case in reality: a more detailed study of the effects of the 
assumed shape for this function will be needed in the future to fully qualify the method. 



5.4-3 Floating prior for the velocities. 



A less biased representation of the underlying field would be to assume that 7 and Vp are two fields which are statistically 
correlated (by the dynamics) but whose realizations are independent. The model is formally identical to equation (^) with 
the restriction that Vp does not obey equation ( ^ ) anymore. The vector of the model parameters is : M = {'y{x),Vp(x)). The 
correlation between 7 and Vp, Cv-,, is considered to be linear. Recall that the prior variance-covariance matrix, Co, has three 
independent terms, shown in bottom right panel of Fig. ^ 



Co = 



Cud 



(35) 



The penalty function then obeys equation (^), and realizations of the velocity field are entitled to fioat around their most 
likely values, equation dl4h. The corresponding model, g, is sketched in Appendix D2.2. The iterative procedure presented 



in Appendix ^ brings the reduced down from values of about a 100 to 1 ± \Jll'N in a few iterations, but does not 
converge if peculiar velocities induce displacements larger than the effective width of the absorption lines. E ven t hough the 
weak prior inversion is more elegant and easier to implement than the strong prior approach (cf. Appendix D2.2), it seems 
to fail to constrain sufficiently our model when redshift distortion is important. This arises because the effective correlation 
in equation ([22) is too weak to induce convergence. 
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Figure 8. Inversion while accounting for peculiar velocity with strong prior. Simulation S is used to test the method. Two examples 
are considered, according to whether there is a large structure nearby the LOS or not (respectively right and left panels). Top panels: 
the simulated spectra. Middle panels: the simulated (solid line) and most likely (dotted line) peculiar velocity along the LOS. Bottom 
panels: the simulation (solid line) and reconstructed (dotted line) log-density (in logj^Q units). 



5.4-4 Discussion 

A priori, the best approach for reconstructin g the d ensity in redshift space would be to use the exphcit Bayesian method with 



a floating prior for the velocity described in § 5.4.3, However, our preliminary analyses show that this method fails to converge 
when applied to one LOS if redshift distortion becomes of the o rder o f the width of absorption lines, which is unfortunately 
the case in realistic situations. The strong prior inversion of 



5.4.2 



tested again on one LOS, seems to be more reliable, 
but gives accurate reconstruction only if the considered LOS is unsaturated and is isolated from large structures. The only 
reliable way to improve the reconstruction is th erefor e to have more informat ion o n the 3D s tructure of the intergalactic 
medium through bundles of LOSs, as studied in § 5.4.1. The diference between § 5.4.3 and § 5.4.2 would then vanish, since the 



.1. The diference between ; 

discrepency between the most likely velocity and the actual field becomes smaller and smaller, while the correlation between 
the densit y and the v elocity becomes simultaneously tighter and tighter. However, we have not explicitely tested the methods 
of 5 5.4.2 and 5 5.4.3 on several LOSs : this is left for future work. 



6 CONCLUSION 

In this paper, an explicit Bayesian technique and a constrained random field method have been proposed to recover various 
properties of the intergalactic medium from observations of the Lyman-a forest along LOSs to quasars. In particular, our 
preliminary analyses suggest that these methods may be used (i) to recover the large scale 3D topology of the IGM from 
inversion of a network of adjacent LOSs observed at low spectral resolution; (ii) to constrain the physical characteristics of 
the gas from inversion of single LOSs observed at high spectral resolution; (iii) to investigate how the number of and the 
distance between LOSs constrain the projected peculiar velocities; (iv) to correct in part for redshift distortions induced by 
these velocities using either strong or weak priors. 

Both approaches rely on prior assumptions on the covariance of the log-density field and the cross-correlation between 
the log-density field and the peculiar velocity field. 

These methods are used in various regimes: as extrapolation tools to recover the 3D structure of the IGM; as non-linear 
deconvolution tools to correct for blending; as non-parametric field extractors and as model fitting routines to constrain the 
parameters of the equation of state. 



We have demonstrated (§ 3.3) that as far as extrapolation is concerned the standard constraine d ran dom field interpolation 



scheme could be viewed as a specific linear sub-case of the Bayesian inversion scheme presented in § 3.1 , The method presented 



in § B.l is therefore complementary to, and more general than standard constrained random field techniques: it can also cope 
with thermal broadening and finite signal to noise, in a manner similar to Wiener filtering, but allows for non-linear models 
and non-zero mean priors. The correlation functions required for the prior need not be measured in the simulations, and can 
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be postulated. It is more flexible since some level of redshift distortion can in principle be corrected for using the full 3D 
information along the bundle (although we did not demonstrate it explicitly in this paper). It is well suited for this kind 
of problems s ince it deals directly with unknown continuous fields [i.e. the parameter space is the Hilbert space L2; see, 
e.g. equation (D16)]. In contrast with the Lucy-Richardson algorithm, regularization is built in. 

We have shown that temperature inversion is degenerate with respect to two parameters describing the equation of state 
of the gas, the temperature scale factor, T and the effective polytropic index /3. 

Recall that we have assumed in this paper the correlation matrices of the log density to be fixed a priori, together with 
the cross-correlation of the log density and the velocities when dealing with peculiar velocities. When the method is applied 
to real data, we will proceed iteratively and recompute these (cross-) correlations once the 3D reconstruction is achieved. We 
expect this procedure to converge and that the convergence limit will not depend too strongly on the initial prior. 

A thorough analysis of the various biases involved in the methods presented here is postponed to a companion paper 
which will investigate statistically the properties of the reconstructed fields and the degeneracies involved in recovering the 
density and the temperature while relying on numerical hydrodynamical simulations. Since this inversion method relies on 
existing cross correlation between the density and the velocity fields, it should still be applicable on scales where dark matter 
dynamics is less relevant, so long as such correlations exist. We have left aside for now the simultaneous true 3D deconvolution 
of both the temperature and the peculiar velocities. 



ACKNOWLEDGMENTS 

We thank F. Bernardeau, E. Thiebaut and D. Pogosyan for many discussions and an introduction to constrained random field 
theory. JLV and PPJ also thank Bob Carswell for useful discussions. JLV was supported in part by the EC TMR network 
"Galaxy Formation and Evolution" and the Centre de Donnees astronomique de Strasbourg. This work was supported by the 
Programme National de Cosmologie. 



REFERENCES 

Abel T., Haehnelt M., 1999, ApJ 520, L13 

Adier R.J., 1981, The Geometry of Random Fields (Chichester: Wiley) 

Alimi J.-M., Bouchet F.R., Pellat R., Sygnet J.-F., Moutarde F., 1990, ApJ 354, 

Backus G., Gilbert F., 1970, Phil. Trans. R. Soc. London 266, 123 

Bardeen J. M., Bond ,T R... Kaiser N.. Sz alav A.S., 1986, ApJ 304, 15 



Bertschinger E., 1995, a.stro-ph/9506070 



Bi HongGuang, Davidsen A.F., 1997, ApJ 477, 579 

Bond J.R., Wadsley J.W., 1998, in 13th TAP meeting Proc, Eds. P. Petitjean & S. Chariot (Editions Frontieres, Paris), p. 143 
Bouchet F.R., Adam J.-C, Pellat R., 1985, A&A 144, 413 

Boulade O., et al., 1998, in Optical Astronomical Instrumentation, Ed. S. D'Odorico, Proc. SPIE 3355, p. 614 
Gen R., Miralda-Escude J., Ostriker J. P., Rauch M., 1994, ApJ 437, L9 

Colombi S., 1996, in Dark Matter in Cosmology, Quantum Measurements, Experimental Gravitation, Proc. of the XXXIst Rencontres 

de Moriond, Eds. R. Ansari, Y. Giraud-Heraud & Tran Thanh Van (Editions Frontieres: Gif-sur-Yvette, France), p. 199 
Colombi S., Bouchet F.R., Schaeffer R., 1994, A&A 281, 301 

Craig I.J.D., Brown J.C., 1986, "Inverse Problems in Astronomy", Adam Hilger Ltd., Bristol and Boston 

Croft R.A.C., Weinberg D.H., Katz N., Hernquist L., 1998, ApJ 495, 44 

Croft R.A.C., Weinberg D.H., Pettini M., Hernquist L., Katz N., 1999, ApJ 520, 1 

Crotts A.P.S., Fang Y., 1998, ApJ 502, 16 

Dinshaw N., Foltz C.B., Impey CD., et al., 1995, Nature 373, 223 
D'Odorico V., Cristiani S., D'Odorico S., et al., 1998, A&A 339, 678 
Folkes S., et al., 1999, MNRAS 308, 459 
Gnedin N., Hui L., 1998, MNRAS 296, 44 

Hamilton A.J.S., Kumar P., Lu E., Matthews A., 1991, ApJ 374, LI 
Hernquist L., Katz N., Weinberg D.H., Miralda-Escude J., 1996, ApJ 457, L51 
Hivon E., 1995, Ph.D. thesis, Universite Paris XI 

Hockney R.W., Eastwood J.W., 1981, Computer Simulation Using Particles (New York: McGraw Hill) 

Hoffman Y., Ribak E., 1992, ApJ 384, 448 

Hui L., 1999, ApJ 516, 519 

Hui L., Gnedin N.Y., 1997, MNRAS 292, 27 

Hui L., Gnedin N.Y., Zhang Y., 1997, ApJ 486, 599 

Hui L., Rutledge R.E., 1999, ApJ 517, 541 

Hui L., Stebbins A., Buries S., 1999, ApJ 511, L5 

Impey CD., Foltz C.B., Retry C.E., Browne I.W.A., Patnaik A.R., 1996, ApJ 462, L53 
Jenkins A., et al., 1998, ApJ, 499, 20 

Le Fevre O., et al., 1998, in 14th lAP meeting Proc, Wide Field Surveys in Cosmology, Eds. S. Colombi, Y. McUier & B. Rabban (Paris: 

Editions Frontieres), p. 327 
Longuet-Higgins M.S., 1957, Phil. Trans. Roy Soc. London A, 249, 321 
Lucy L., 1974, AJ 79, 745 



© 0000 RAS, MNRAS 000, 000-000 



Inversion of the Lyman-a forest: 3D investigation of the intergalactic medium 19 



Meiksin A., Madau P., 1993, ApJ 412, 34 

Miralda-Escude J., Cen R., Ostriker J. P., Rauch M., 1996, ApJ 471, 582 
Moutarde F., Alimi J.-M., Bouchet F.R., Pellat R., Ramani A., 1991, ApJ 382, 377 
Miicket J. P., Pctitjean P., Kates R., Riediger R., 1996, A&A 308, 17 
Nusser A., Haehnelt M., 1999a, MNRAS 303. 179 
Nusser A., Haehnelt M., 1999b, a.stro-ph/9906406| 



Peacock J. A., Dodds S.J., 1996, MNRAS 280, 19P 
Peebles P.J.E., 1980, The Large-Scale Structure of the Universe (Princeton University Press), p. 153 
Petitjean P., Webb J.K., Rauch M., et al., 1993, MNRAS 262, 499 
Petitjean P., Mucket J., Kates R.E., 1995, A&A 295, L9 

Petitjean P., Surdej J., Smette A., Shaver P., Miicket J., Remy M., 1998, A&A 334, L45 
Pichon C, Thiebaut E., 1998, MNRAS 301, 419 
Press W.H., Rybicki G.B., 1993, 418, 585 
Rahman A., 1964, Phys. Rev. A., 136, 405 
Rauch M., Haehnelt M., 1995, MNRAS 275, 76 

Rauch M., Miralda-Escude J., Sargent W.L.W., et al., 1997, ApJ 489, 7 

Reisenegger A., Miralda-Escude J., 1995, ApJ 449, 476 

Rice S.O., 1944, Bell System Tech J. 23, 282 

Rice S.O., 1945, Bell System Tech J. 24, 41 

Ricotti M., Gnedin N., ShuU J.M., 2000, ApJ 534, 41 

RoUinde E., Petitjean P., Pichon C, Physical Properties of the Lyman-a Forest from the Inversion of the HE 1122-1628 Quasar Spectrum, 

submitted to A&A. 
Schaye J., Theuns T., Leonard A., Efstathiou G., 1999, MNRAS 310, 57 
Scoccimarro R., 1998, MNRAS, 299, 1097 

Smette A., Robertson J.G., Shaver P.A., et al., 1995, A&AS 113, 199 
Szalay A., 2000, Internal Astronomical Union Symposium 204, 16 

Wiener N., 1949, in Extrapolation and Smoothing of Stationary Time Series (Wiley: New York) 
Tarantola A., Valette B., 1982a, Journal of Geophysics 50, 159 

Tarantola A., Valette B., 1982b, Reviews of Geophysics and Space physics 20, 219 
Theuns T., Leonard A., Schaye J., Efstathiou G., 1999, MNRAS 303, L58 
Theuns T., Schaye J., Haehnelt M., 2000, MNRAS,315,600 
Valageas P., Schaeffer R., Silk J., 1999, A&A 345, 691 

Vergely J.L., Freire Ferrero R., Siebert A., Valette B., 2001, A&A, 366,1016 

Weinberg D.H., 1999, in proc. of ESO/MPA conf. Evolution of Largo Scale Structure: from Recombination to Garching, Eds. A.J. Banday, 

R.K. Sheth & L.N. da Costa (ESQ: Garching), p. 346 
Weinberg D.H., Croft R.A.C., Hernquist L., Katz N., Pcttini M., 1999, ApJ 522, 563 
Zel'dovich Ya.B., 1970, A&A 5, 84 

Zaroubi S., Hoffman Y., Fisher K.B., Lahav, O., 1995, ApJ 449, 446 
Zhang Yu, Anninos P., Norman M.L., 1995, ApJ 453, L57 

APPENDIX A: MINIMIZATION PROCEDURE. 

In this section we sketch an iterative procedure leading to the optimization of the posterior probabihty of the model for a 
given data set in equation (^o|). The minimum of the argument of the exponential in equation ( p^ is shown by a simple 
variational argument (Tarantola and Valette, 1982a; 1982b) to obey the implicit equation: 

(M) = Mo + Co • • (Cd G • Co • G^)"' • (D G • ({M) - Mq) - <7({M))) , (Al) 
with G, the matrix of partial derivatives: 

This minimum is found using an iterative procedure: 

M[,+i, = Mo + Co ■ G[fe]^ ■ (Cd + G[fe] ■ Co • G^^^^r' ■ (D + G[,j ■ (M[fe] - Mo) - ff(M[,j)) , (A3) 

where subscript [k] refers to the iteration order. In this scheme the minimum corresponds to M = M[oo] and in practice is 
found via a convergence criterion on the relative changes between iteration [fc] and [fc -I- 1] . For the sake of numerical efflciency, 
rather than inverting (Cd + G[fc] ■ Co • G[fe]^), we solve for the vector Wf^j satisfying 

S[fci-W[fe, = (D-f G[fe, ■(M[fc]-Mo)-3(M[fe,)), where Sy^] = + Gy^^y Co ■ Gy^]^ , (A4) 
and iterate: 

M[fc+i, =Mo + Co-G[fc,^-W[fe]. (A5) 

From now on, we drop the subscript [k\. Once the maximum of equation (|l^) has been reached, an approximation of the 
internal error made on the parameter estimation is derived from a second order development of the posterior distribution 
function in the vicinity of the solution : 
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Cm = Co - Co • ■ S"' ■ G- Co. (A6) 

The high spatial frequency fluctuations are lost in the inverse process because of limited resolution and finite signal to noise 
ratio. The prior correlation function therefore plays an important role to transform an ill-posed problem into an invertible 
one. How is the density information degraded in the spectra? This question can be addressed via the resolving kernel, R, 
introduced for the first time by Backus and Gilbert (1970) and which gives the spread of the density estimation at a given 
position. Suppose that we know the true model, Mtrue. T he d ata can be written: D = g(Mtruc) . Approximating locally 



operator g near its minimum as a linear operator, equation (Al) yields 



(M) - Mo = Co ■ G^ ■ S-^ • G • (Mtruc - Mo) = R ■ (Mu-uc - Mo) , (A7) 
which defines the resolving kernel R( low band pass filter. 



APPENDIX B: CONSTRAINTS RANDOM FIELDS & MULTIPLE LINE OF SIGHTS 



As a though experiment, let us assume that we know the density contrast 5 on n points and ask what the corresponding most 
likely velocity (or density) at points labeled k — 1 ■ ■ ■ p, vok is. We shall not assume that the densities • • • , 5„ are necessarily 
along the same LOS nor that the quantity vjk is sought along any of these. Let X — [vji, ■ • ■ , zup, 5i, • • ■ , Sn]. We define 



C = 



(zUlZUp) 

(zuiSi) 



{zUlZUp) {zuiSi) 

(ujpUJp) (uJpSl) 

(WpSl) (SlSl) 

(ujpSn) {Sl5„) 



(ujpSn) 
{<5l<5n) 

(SnSn) 



(Bl) 



so that Cmw is the p x p autocorrelation matrix of the sought field, Css is the n x n autocorrelation matrix of the known 
density field, and C^s is the p x n cross-correlation matrix of the sought field with the density field. The joint probability of 
achieving velocity zut and density profile Si, - ■ ■ ,5n is given by 



p(X)d" ''X = p{wi, • • • , TOp, 5i, - ■ ■ , Sn)dwi ■ ■ ■ dwpdSi ■ ■ ■ dSn = exp 
The argument of the exponential can be rearranged as 



4 E i^-')..^^^" 



^ a ,6— 1 ■ - -n+p 



d^+PX 



^(27r)"+PdetlC| 



C^i Cs5 



{■uj - Cn,s ■ Cgl ■ 5) + rest(B2) 



where "rest" stands for terms independent of = {zui ■ ■ ■ u?p). Applying Bayes' theorem, the conditional probability of -cj, 
given the density profile {Si, - ■ ■ ,5n), obeys 

p{vji, ■ ■ ■ ,vjp\5i, - ■ ■ , 5„)dci7i ■ • ■ dzzip = p{ta?i, ■ ■ ■ t^p,5i, - ■ ■ , Sn)/p{Si, ■ ■ ■ , 5n)dzui ■ ■ ■ dujp , 
which in turns implies that 



p(toi, • • • ,-cup\5i,- ■ ■ ,5„) cx exp 



(^Cww C^^5 ■ C^^ ■ C-ws ^ ■ ^"Ct? C^s ■ C^^ ' ' 

since p{Si, ■ ■ ■ , Sn) is independent of vj. The maximum of the conditional probability is therefore reached for (zu) given by 
(■uj) = Cu,s ■ Csl ■ 5 . (B3) 



Bl Peculiar velocity-density relation. 

Let us now be more specific about ruk and assume, in this subsection, that we are seeking the most likely peculiar velocity 
field, Vk, where we dropped the subscript p referring to "peculiar". 



Bl.l One line of sight 

Recall that nothing has been said about the relative position of the Si and the Vk at this stage. Let us now assume for a while 
that the subscript i refers to a regular ordering along the LOS, so that Si = S{iAx), and Vi = v{iAx). Let us also introduce 
the intermediate field, u = (iii)^^j...„ = C^^^ • S, so that equation (B3) reads 



(v) = C.„i ■ u , 



Css ■ u . 



(B4) 



© 0000 RAS, MNRAS 000, 000-000 



Inversion of the Lyman-a forest: 3D investigation of the intergalactic medium 21 



Multiplying both sides of equation 



i 



) by Ax, we get 
/ u[jAx]{v[jAx]S[{i - j)Aa;])Aa:: = {v[iAx]) Ax , 



(Cs. 



■ UiAx 



E 

i 



u[jAx]{S\jAx]5[{i - j)Ax])Ax = S[iAx]Ax . 



In the limit of Ax going to zero, equation (B5) reads 

(^S{x — x')v{x')'^u{x') dx' = {v{x)) Ax and / (^S{x — x')5{x')'^u{x') dx' = S{x)A 



(B5) 



(B6) 



(B7) 



Transforming equation (Bf)) in Fourier space leads to 

where Pss.mikx) and PvS,iD{kx) are respectively the ID density power spectrum and the ID mixed velocity density power 
spectrum, while 5{kx) and {v}{kx) are the Fourier transform of 5{x) and {v}{x) respectively. Here the ID power spectra satisfy 



PM,iD(fc.) = / P3D(k)VKj(k)d'fcx and P^s,iD{kx) = 



P3D(k)fe^ n N j2, 

I^X \ I 



(B8) 



where P3D (k) is the 3D power spectrum of the density contrast while Wj (k) is a window function whose characteristic scale 
Rj should be the Jeans length, but is chosen here to be the maximum of the Jeans length and the sampling scale. Indeed, 
below this latter scale no information is to be derived from the data. Note that the direct inversion of equation (B3) may 
lead to si gnifi cant aliasing if the power spectrum has energy beyond the cutoff frequency 1/Rj. The power spectrum ratio in 
equation (B7) is an antisymmetric filter which relates the most likely velocity field to a given density field in linear theory. 
Equation (B7) can be transformed back into real space as 



{v){x) = / K^''\x,x')5{x')dx' , 



where if'"' (x, x') = — / e'*^""'^ 
2n J 



') PvS,lD{kx) 

Pss,iu{kx) 



(B9) 



This filter is illustrated in Fig. ^ Equation (B9) could be used to derive if'"' (x, x') from perturbation theory in the weakly 
non-linear regime given an initial power spectrum. In practice, this filter is constructed here from the simulation in the 
following manner: for each LOS in the simulation, we compute the FFT of the over-density and of the velocity; we multiply 
one by the complex conjugate of the other and repeat the operation on the whole box; we then average over the box (using a 
bundle of 60 x 60 LOSs) and FFT transform back in real space: this yields equation (p3!J). 



B1.2 Multiple lines of sight 



Let us now turn to the more general problem of multiple LOSs. How can we take advantage of larger scale information on 
multiple LOSs to constrain the velocity along the measured LOSs? 

To conduct the calculation which follows, we order the Si, - ■ ■ ,5„ where n — Lp so that the first p corresponds to the first 
line of sight, the next p to the second line of sight and so on for the £ — 1- ■ ■ L line of sights. Our purpose here is to account for 
the fact that in realistic situations, the LOSs distribution on the sky is not necessarily uniform and that the volume covered 
by all LOSs is rather elongated (i.e. L <^ p). For the sake of numerical efficiency, we Fourier transform along the longitudinal 
direction and are left with a matrix representation for the two transverse dimensions. We write each block in Fourier space 
in terms of the corresponding ID power spectra (this is possible since both Fourier transform and matrix multiplication are 
linear operations which therefore commute when applied on different directions); following the derivation of equation (B7) we 
find 



(v) = H . A- 



(BIO) 



where 



Pl^kx 



Plsikx 



and (v) = \v'^{kx), ■ ■ ■ v^{kx 



Pllikx 



Pltikx 



(BU) 



.Pjf(fc.) ... P^fikx). 

\S^{kx), ■ ■ ■ S^{kx)\ , where the superscript refers to the L LOSs. Here 



P"'ikx 



p"Akx) 



exp (ikx . {xx,« - ^±,1'}) -P3D(k) W^j_jj(k) d^kj 



exp (ikx . {xx,j> - xx,r}) W^j,R(k) 



P3D(k)fc, 

fci-f ki 



d^kj 



(B12) 



(B13) 
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The window function, Wj f^{kx,\i.±) involves two scales: the longitudinal Jeans length and the transverse mean inter-LOS 



separation, R. The latter filterin g is required to apodise the inversion. Note that P^^(kx) 
Psv,i'D{kx) are given by equation (B8). Equation (BfO) reads back into real space: 



and P Hkx 



K)>f)>{x,x )5i{x)Ax' , where Kfii{x,x') 



1 

2^ 



'H- A 



, dkx 



(B14) 



where the matrix H • is given in equation ( Bf 1 ). In practice, this filter is also constructed here from the simulation 

following the prescription sketched above: for each bundle of LOSs in the simulation, we compute the FFT of the log density 
and of the velocity; we multiply one bundle by the complex conjugate of the other and re peat the operation on the whole 
box; we then average over the box (using a bundle of 20 x 20 LOSs): this yields the matrix (Bll). The matrix multiplication 
in equation (B14) is carried Fourier mode by Fourier mode, while the inverse Fourier transform is done by FFT. 



B2 3D density-LOSs density relation. 

Let us now assume that vjk refers to the 3D density on a grid of P points at the point xa = {x±,\,x\)\=i...p. No restriction 
on the location of xa along the LOSs applies here. Under these assumptions, the above section translate as: 



(5(='°))(xa) = ^ J Kif\^^,^'i)Si{x')dx' , where if (f > (xa, x^) = ^ ^ exp (ik ■ (xa - x^)) (H3D ■ A"^)^^ d^k(B 



15) 



with A obeying equation (Bll) and 



\kx). 



given P^^{kx) = / exp (i kx • {xx,f - xx,a}) -P3D(k) W^j^;j(k) d k_L , 



We check that when we consider a point on the LOSs, x = (xx,^,^), K^^^\'k,x') — Sd{x — x')5f where 5f stands for the 
Kronecker symbol. 



(B16) 



APPENDIX C: PROPERTIES OF THE SIMULATION 

Note from Table ^ that the simulation boxes are rectangular. This long box technique might be questionable. Indeed, the num- 
ber of modes available in Fourier space is different along each coordinate axis. This anisotropic mode sampling contaminates 
the simulation, and the effect augments with the ratio between the largest and the smallest side of the box. 

One way to test, at least partly, the quality of our A''-body experiments is to compare second-order statistics measured 



in the simulations to theoretical predictions, as illustrated by Fig. CI. Left panel shows the measured power-spectrum, 
P{k) = {\Skf), in the density field smoothed with the procedure described in § ^ Agreement with linear theory is appropriate 
at large scales, as expected. For comparison, we plot as well the result obtained from the non-linear Ansatz of Hamilton et 
al. (1991) optimized for the power-spectrum by Peacock & Dodds (1996). The overall agreement between measurements and 
non-linear theory is quite good, except at large values of k in both simulations. This is mainly the effect of the grid and 
to a lesser extent a consequence of the adaptive Gaussian smoothing. Indeed, any procedure inferring on a grid a density 
from a particle distribution implies some smoothing with a window of approximately the mesh cell size. This induces large-fc 
damping of the power-spectrum. Here, the smoothing is not well defined, but most of the particles are in dense regions, due 
to non-linear clustering, and therefore the corresponding smoothing length, £, is likely to be much smaller than the grid size. 
Thus, for most particles, all the contribution to the density is given to the nearest grid point (NGP). As a result, the Gaussian 
adaptive smoothing has a damping effect quite close, although slightly larger, to top hat smoothing with a mesh cell (at least 



for sufficiently evolved stages). This is illustrated by middle panel of Fig. CI which shows the power-spectrum after correction 
for damping due to NGP assignment. Most of the missing power is recovered, as expected, and the agreement with theory 
is much improved. Note that the triangles tend to be slightly above the solid curve in the neighborhood of logj^Q k ~ 0.4. 
This irregularity is not surprising, given the small physical size of simulation S. It is probably associated to a rare event, for 
example an atypical cluster, although this does not show up significantly on Fig. hi. 



Right panel of Fig. CI shows the real space counterpart of the power-spectrum. More precisely, it displays the variance 
of the smoothed density field with a sphere of radius ^ as a function of I. To measure it, we computed the density from the 
particle distribution on a grid twice thinner than the one used to do the simulation, using the cloud-in-cell method (GIG, 
e.g., Hockney & Eastwood 1981). Then we corrected for GIG damping and smoothed with the top hat window of size £ in 
Fourier space. Finally, back in real space, the variance of the density field was computed with the appropriate corrections 
for discreteness (e.g., Peebles 1980), i.e. = {S^) — 1/N, where N is the average particle count in a cell of radius £. The 
scale range considered was Ag < ^ < L/ 4, where L is the smallest dimension of the box and Ag the spatial resolution of the 
simulation. As can been seen in Fig. |Cl| , the agreement with theoretical predictions is quite good, even at £ ~ Ag although the 
effect of softening of the forces is slightly felt at this point. Note as well that the triangles are somewhat shifted up compared 
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Figure CI. Left panel: the power-spectrum measured at z = 2 in the S (filled triangles) and B (open squares) simulations after adaptive 
smoothing, in logarithmic coordinates (wavenumber k is expressed in Mpc~^). It is compared to linear theory (dots) and to non-linear 
ansatz of Peacock & Dodds (1996, solid curve). Middle panel: same as left panel, except that a correction for NGP damping was applied 
to the data prior to measurement of P{k). Right panel: the variance of the smoothed density field with a spherical cell of radius r is 
shown in logarithmic coordinates as a function of r, as explained in the text. 

to the non-linear ansatz (except at very large scales, where finite volume effect contamination reduces the value of a^, e.g., 
Colombi, Bouchet & Schaeffer 1994), as already noticed for the power-spectrum. 



APPENDIX D: IMPLEMENTATION OF THE INVERSE METHOD. 
Dl Neglecting peculiar velocities. 

Dl.l High resolution spectra. 

When the spectral resolution is higher than 100 km/s, thermal broadening cannot be neglected and our model reads 



g«(7) = A{z)ci 



(-Do(a;,xx) exp[7(a::,xx)])° ''^ exp -C2 



■ {Do(x,yi±) exp[7(a;,xx)]) 
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dx Sn{^±-xj_e)d xx , (Dl) 



where a, A{z), ci, C2, /3, -Do(2;,xx) and wu are defined in equations (^)-(0) and equation (p5|). Since the model, M = ^{x,x±) 
is a continuous field, we need to interpret equation (^) in terms of convolutions, and functional derivatives. In particular the 
matrix of partial functional (Frechet) derivatives, G, has the following kernel: 



dgi, 



(G)«(x,xx)= (^"^J {x,x±) = A{z)ciD^ '^(a::,xx) exp [(a - /3)7(x,xx)] -B«(a;,xx)(5D(xx - xx.f 

with 5_d(xx — xx.f) the Dirac delta function accounting for the singular distribution of LOSs and: 
Bu(x,x±) = ((a " (3) + C22(3{wii - a;)^D(7^''(a;, xx) exp [-2/37(x, xx)])S«(a::, xx) , 
where : 

{wu - x)'^ 



S«(3;,xx) = exp -C2 



■ (Do(x, xx) exp [7(2;, xx)])^'' 



The operator, G, defined by equation (D2) contracts over a given field, r;, as: 
(G)ii ■ rj = A{z)ciDq~'^{x, xx) exp [{a - l3)'y{x, xx)] Bit{x, x.±^e)ri{x, x±^t)dx . 



(D2) 

(D3) 
(D4) 

(D5) 



D1.2 Low resolution spectra. 

At low spectral resolution, the model spells 



gui'y) ^ A{z) JjJ (1)0(2;, Xx) exp[7(a::,xx)])"fo (k - TO«) fo(xx - xx,f)da;d^xx , (D6) 

which corresponds to the limit C2 — > cxa in equation ( [D1[ ). The kernel of partial functional derivatives G obeys: 
(G)«(a;,xx) = A(z)aDo{x,x_L,i) exp [Q7(x,xx,f)] Sd {x - w) Sd{x± - xx,(?) . (D7) 



For instance (G • Co ■ G )it,jm in equation (Al) reads 
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A(z)^a^C-,^ (wa, Wj^, xx/ , xj_,,„) Dq , xj_,f)Do (wj™, xx,m) exp [a7(wif , xx,^) + a'y{wjm, xx,m)] • (D8) 

D2 Implementation of the inverse method with peculiar velocities 

D2. 1 Strong prior: peculiar velocity equals most likely velocity 
Restricting ourselves to a unique LOS, our model reads 

gu[l) = A[z)c, r°°{Do{x) exp[7(x)])""^ exp ( -c^j^^^^^-^-^^^^j^) dx , (D9) 
J_^ V {Do(x) exp[j{x)]yl^ J 

where peculiar velocity, Vp{x), equals the most likely velocity 

{vp{x)}= J K^"\x,y)jiy)dy. (DIO) 
The matrix of partial functional derivatives, G,; is defined by its contraction over a given field, rj, as: 

{G),-v= I G^ix)v{x)dx ^ I A^{x)7Jix)dx + I V,ix) \ [ K'-''\x,y)v{y)dy\ dx , (Dll) 



with : 

Mix) = A{z)ciD^-^{x) exp ((a - f3)'yix)) {a - /3 + 2/3c2Do exp(-2/?7(a;)) (w, - x - Vp{x))} E,{x) , (D12) 
V,{x) = 4(z)ciD,<°~^''^ (a;) exp ((a - 2c2 (w, - x ~ Vp{x)) E,{x) , (D13) 

{Wi — X — Vp{x))^ 

- Df{x)eM2Pj{x)), 



E,{x) = exp ( ~C2^^^ — ) • (dw) 



The double integration in the last term of equation (Dll) arises because g is effectively a double convolution. 



D2.2 Weak prior: floating peculiar velocity 

We aim to determine directly the density and the velocity, while assuming the correlations betw een t hese two quantities are 
known. The model is identical to equation (D9), but the peculiar velocity does not obey equation (piol). The matrix of partial 



functional derivatives is : G = {dg/d"f, dg/dvp). The first component of G is given by equation ( |D2[ ). The kernel of the second 
component is computed as follow; 

■p- = A{z)ciD^-'-"\x) exp {{a ^ 3l3)-/{x)) 2c2(w, ~x- Vp{x))E,{x) = £,{x) (D15) 

OVp 



where Ei{x) is given by equation (D14). The matrix G ■ Co ■ G [where MCo is given by equation (p5|)] is computed as follow 



// 



[Ai{x)Aj{y)C^-,{x,y) + A^{x)SJ{y)C^v{x,y) + £i{x)Aj{y)Cv^i{x,y) + £i{x)£j{y)C^v{x,y)] dxdy . (D16) 



Note that this is a double integral to be compared to the quadruple integral involved in the computation of the equivalent 
term in the strong prior method (where contraction already involves a double convolution). 

This paper has been produced using the Royal Astronomical Society/Blackwell Science ETJtX style file. 
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